Finding nontrivial semantic matches between database schemas
نویسندگان
چکیده
Finding nontrivial semantic matches between database schemas 3 Summary Automation of schema matching has been under investigation for already some decades, still the systems usually do not find all matches or suggests incorrect matches. Due to this imperfection matching schemas it is still often done manually by domain experts. The rapidly increasing number of heterogeneous and distributed data sources in enterprises and on the web, the manual matching approach is more and more a limitation and the need for automating the schema matching process is increasingly important. This thesis describes the schema matching framework and prototype Map-IT, which is based on FlexiMatch. The schema matcher supports the multi-strategy approach, with each strategy represented as a Validator. Key characteristics of Map-IT are: • Map-IT and its Validators can learn from previous mappings. • Validator can easily be added to or selected from the Validator repository, in order to boost future matching performance or to adapt the system to the match task at hand. • Current Validators exploit different database information aspects. • Map-IT adapts the weights of the Validators to its environment using the Meta-Learner. An important limitation of Map-IT was that it did not search for nontrivial matches. Also it was not able to suggest matches with a complex local cardinality. One of the goals was to list and analyze what kind of nontrivial matches exist. In the thesis various match problems are addressed with multiple examples and categorized according to similarities in the correlation between the attribute semantics. Also the freedom and variety of database modeling complicates the schema matching problem. The substring match category represents matches which have duplicate substrings in the instance data of matching attributes which can be separated by delimiting characters. These matches can be spread-out over more attributes and may have a partial semantic overlap. No existing approach was found that solves this type of match problem. A new approach is developed that searches for likely linked record pairs, coping with schema unalignment and bad duplicates such as ambiguous words and stop words. For each record pair accompanying transformation functions are generated which contains String split and concatenate operations. From the set of transformation functions likely substring matches are mined using a clustering technique and a similarity value is calculated for each match. To each match a set of transformation functions is assigned. If a specific match has alternative transformation functions a ranked list …
منابع مشابه
Searching XML Databases for Semantically-related Schemas
In this paper, we address the problem of searching schema databases for semantically-related schemas. We first give a method of finding semantic similarity between pair-wise schemas based on tokenization, part-of-speech tagging, word expansion, and ontology matching. We then address the problem of indexing the schema database through a semantic hash table. Matching schemas in the database are f...
متن کاملClass Structures and Lexical Similarities of Class Names for Ontology Matching
Semantic Interoperability is a major issue for National Spatial data Infrastructures (NSDIs) and mapping across heterogeneous databases is essential for such interoperability. Mapping of schemas based on ontology mapping provides opportunities for semantic translation of schemas elements and hence for database queries across heterogeneous sources. Such semantics based mappings are usually human...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملiMAP: Discovering Complex Semantic Matches between Database Schemas
Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important cla...
متن کاملAnalyzing and revising data integration schemas to improve their matchability
Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the ma...
متن کامل